A New Memory MapReduce Framework for Higher Access to Resources

نویسندگان

  • ZuKuan Wei
  • Bo Hong
  • JaeHong Kim
چکیده

The demand for highly parallel data processing platform was growing due to an explosion in the number of massive-scale data applications both in academia and industry. MapReduce was one of the most meaningful solutions to deal with big data distributed computing. This paper was based on the work of Hadoop MapReduce. In the face of massive data computing and calculation process, MapReduce generated a lot of dynamic data, but these data were discarded after the task completed. Meanwhile, a large number of dynamic data were written to HDFS during task execution, caused much unnecessary IO cost. In this paper, we analyzed existing distributed caching mechanism and proposed a new Memory MapReduce framework that has a real-time response to read or write request from task nodes, maintain related information about cache data. After performance testing, we could clearly find MapReduce with cache significantly improved in IO performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MapReduce-Style Computation in Distributed Virtual Memory

Many cloud computing technologies, such as MapReduce, use file systems as the system-wide substrate for data handling. A distributed file system provides a global name space and stores data persistently, but it also introduces significant overhead. Several recent systems use DRAM to store data and tremendously improve the performance of cloud computing systems. However, both our own experience ...

متن کامل

An Enhanced Map Reduce Framework for Improving the Performance of Massively Scalable Private Clouds

Cloud Computing systems provide access to large amount of data and other resources through a large number of interfaces. Apache Hadoop is a framework that allows distributed processing of large sets of data across cluster of computers. It is a powerful abstraction proposed for making scalable and fault tolerant applications. In this paper we have suggested an enhanced framework for MapReduce wh...

متن کامل

I/O Throttling and Coordination for MapReduce

As a leading framework for data intensive computing, MapReduce has gained enormous popularity in large-scale data analysis. With the increasing adoption of multi/many core platform, more and more MapReduce tasks are now running on the same node and sharing the same storage resources. The concurrency of tasks raises the issue of I/O stream congestion. We have observed significant throughput drop...

متن کامل

A review of methods for resource allocation and operational framework in cloud computing

The issue of management and allocation of resources in cloud computing environments, according to the breadth of scale and modern technology implementation, is a complicated issue. Issues such as: the heterogeneity of resources, resource dependencies to each other, the dynamics of the environment, virtualization, workload diversity as well as a wide range of management objectives of cloud servi...

متن کامل

Enhancing Map-Reduce Framework for Bigdata with Hierarchical Clustering

MapReduce is a software framework that allows certain kinds of parallelizable or distributable problems involving large data sets to be solved using computing clusters. This paper introduces our experience of grouping internet users by mining a huge volume of web access log of up to 500 gigabytes. The application is realized using hierarchical clustering algorithms with Map-Reduce, a parallel p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017